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ABSTRACT 

We demonstrate a fast spin-s spherical harmonic transform algorithm, which 
is flexible and exact for band-limited functions. In contrast to previous work, 
where spin transforms are computed independently, our algorithm permits the 
computation of several distinct spin transforms simultaneously. Specifically, only 
one set of special functions is computed for transforms of quantities with any spin, 
namely the Wigner d-matrices evaluated at 7r/2, which may be computed with 
efficient recursions. For any spin the computation scales as 0{L^) where L is the 
band-limit of the function. Our publicly available numerical implementation per- 
mits very high accuracy at modest computational cost. We discuss applications 
to the Cosmic Microwave Background (CMB) and gravitational lensing. 

Subject headings: cosmic background radiation, polarization, gravitational lensing: 
weak, methods: numerical 
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Introduction 



Spin-weighted spherical harmonic functions generahze the standard spherical harmonics 
to allow the representation and decomposition of spin-weighted functions on the sphere. 
Introduced to s tudy gravitational radiation and the electron ' s wavefunct ion in a magnetic 



monopole field ( jNewman fc Penrose 



1966 



Wu fc Yang 



1976 



Dray 



19851 ). spin harmonics 



have several applications in astrophysics and cosmology. In particular, they are useful for 
statistical studies of orientation-dependent signals on the celestial sphere, such as cosmic 
shear measurements of lensed galaxies, cosmic microwave background (CMB) polarization, 
and gravitational lensing of polarized radiation, among others. 

A spin-s function on the sphere, which transforms under a local rotation by angle if) as 
/ — )■ e*'"^/, may be decomposed as 

L m=l 



fio, 



/ J / J s^lra s^lr. 



;i) 



/=0 m=-l 

using the set of spin spherical harmonics ^imiQ^ 0), complex- valued functions on the sphere. 
Standard scalar spherical harmonics are represented by s = 0; from these we obtai n the 



spin harmonics using spin-raising and lowering operators (e.g. 



Goldberg et al. 



19671) • The 



decomposition depends on the orthogonality and completeness relationships 



«r/ 



sllm\^-,H)} s^Vm'K^-. 



Yii^A 



0eS2 



J2sYUe',<p'),Y, 



ImK"^' 



6{(j)' -(j))6{cos 6' -COS 6). 



(2) 



Im 



For a band-limited function with sdim 7^ only for / < L, the brute force computation 
of the transform requires 0{L^) operations: roughly, there are ~ L^ terms in the sum for 
each of the ~ L^ points on the sphere required to Nyquist sample the function. In this 
work, we describe a "fast" transform, which performs the analysis and synthesis for spin-s 
functions in 0{L^) operations on an equiangular grid. 



-4- 



Several methods e xist for spin transforms on the sphere, but our method offers 



some advantages. The 



DriscoU fc Healyl (119941 ) algorithm for a scalar spherical harmonic 



transform scales very well, as 0{L'^ log^ L), and relates the req uired Legendr e trans form to 



Wiaux et al. 



(120071 ) used 



a fast Fourier transform (FFT) on a specific equiangular grid. 
that method and further relations between the scalar harmonics and the s = ±2 harmonics, 
implementing spin transforms at the same scaling. The difficulty with these methods is 
that they require a large amount of memory to store precomputed special functions. These 
precomputed functions require 0{L^) operations and storage, and so spoil the efficiency for 



computing a single transform, but need to be computed only once, in principle. 



Others have computed spin harmonic transf orms on anot 



ler p i xelizatfon o 



Gorski et al.l 



2005 



Lewi 



3 



" the sphere 



20051 ). However, 



at 0{L^) scaling without precomputations (e.g. 
these algorithms use recursions which depend on spin, so the individual spin transforms 
must be computed independently. Our 0{L^) algorithm alleviates this difficulty, since the 
same recursion is used for all transforms, and multiple distinc t spin tra i isform s can be 



computed in a single pass of a general code. The approach in iMcEwenI ( 120081 ) is similar 



to ours, but differs in the details of the transforms, and suffers from numerical stability 
problems even at modest multipoles. Our approach is numerically stable to high multipoles. 

We present our algorithm in section [21 describing forward and inverse transforms, and 
our publicly-available numerical implementation. In section [3] we discuss our results and 
conclude in HI 



Method 



We exploit the well-known relationship (iGoldberg et al. 



19671 ) between the spin-s 



spherical harmonics and the Wigner (i-functions, which can be used to characterize rotations 
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in harmonic space: 



sYlm{6, I 



'21 + 1 



(3) 



The d-function is associated with an active right handed rotation about the y-axis by 6, a 
rotation which we can alternatively represent as a sequence of five rotations: (1) right by 
tt/2 about the z-axis, (2) right by 7r/2 about the y-axis, (3) right by 6 about the 2;-axis, (4) 
left by 7r/2 about the y-axis, and (5) left by tt/2 about the 2;-axis. Defining 



^mm' — ^mm'\^ I '^)i 



(4) 



this allows use to rewrite the Wigner d-functions as (lRisbdll996l ): 

m=+l 



d'MM'i^) 



■M-M 



' Yl ^mAfexp(-im^)Aj„jv^, 



(5) 



m=—l 



This seemingly complicated procedure has two advantages: (1) all rotations about the 
y-axis can be represented in terms of the Wigner (i-functions evaluated only at 71/2; and 
(2) sums over exponentials can be done efficiently with Fourier transforms. This particular 
factorization is a s pecial case of the rnore ge neral representation of convolution on the 



sphere presented in 



Wandelt fc Gorskil fl200lh . 



The A-matrices can be co mputed quickly and easily using recursion relations (JRisbo 



1996 



Trapani &: Navaza 



20061 ). Several symmetries allow us to improve the computational 



efficiency, so only ~ 1/8 of the matrix entries need to be computed (appendix lAl). 



2.1. Forward transform 



The forward (direct or analysis) spin-s spherical harmonic transform of a spin-s 
function f{9, 0) is defined as 



saim= I dOdct) ,Yi*^{d,(f))f{e,(j)) sin d. 



(6) 
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Using the relationship to Wigner d-functions, the transform becomes 



le,^€S^ V 47r ^ ^ 

m'=+l 



i-iy 



2/ + 1 

4:71 



?n+s 



Y] ^L'mAL'(-s) / dOdcj) exp{-im'e)exp{-im^)sm9f{e, 



m'=-l 
m'=+l 



,—lY\l : i^ '^ 2_^ '^■m'm^m'{~s)^n 

m'=-l 



47r 



(7) 



where we have ignored, for the moment, the details of evaluating the integral. 



^m'm 



= / d9d(f) exp(—im'9)exi){—im(f))sm9f{9, 



This is a quadrature problem in the form of a 2D Fourier integral. In appendix [B] we show 
that for a band-limited function these (2L + 1)^ integrals can be performed exactly in 
0{L'^ log L) operations by simply evaluating a two-dimensional FFT of a modified form of 
the function multiplied by an easily computed set of quadrature weights. 



Since each component of AL^/ can b e computed using C(l) floating point evaluations 



.e.g. 



Risbo 



1996 



Trapani fc Navaza 



20061 ). we have a formula that evaluates the spherical 



harmonic coefficients of the general spin s transform by evaluating a single sum over m' for 
each / and m, with asymptotic scaling 0{L^). 

A symmetry on the first azimuth index of the A allows a more efficient representation 
of the sum, which cuts the computation time in half: 



m'=+l m'=+l 

/ , ^m'rn^m'{-s)^rn'm = 2_^ ^m'm^m' {-s)Jrr 
m'=—l m'=0 



(9) 



where 



J rn.'r 



LOm 



J-m'm ' \ ^) -'l 



{—m')m 



if ?7i' = 
if m' > 



(10) 



and Jm'm is undefined for m' < 0. 
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The symmetry on the second azimuth index of the A allows us to rewrite the expression 
so that it requires only values from the A matrices in the non- negative quadrant, which 
allows more efficient use of the recursion and allows reuse of the ^m'm^in'i-s) pi'oduct for 
another modest efficiency gain. 

If / is real, the Fourier content is halved, and /(_m')(-m) = -^m'm; implying Jo(-m) = Jom 
and Jm'(-m) = (~l)™'^*'^m'm (^^^ "^' > 0)- ^^ ^^e this relationship to cut the computation 
time by another factor of two. Additionally, realness implies _sa/(_m) = sdlm^ providing 
some transforms for free. 



2.2. Inverse Transform 

The backward (inverse or synthesis) spherical harmonic transform maps the saim, 
s < I < L, into a function on the sphere 

f{e,<l)) = J2 saim sYlmiOA)- (11) 

Im 

Since the inverse transform does not involve an integral, issues of quadrature accuracy do 
not arise. We can again use the Wigner function relation to write this as 

/(^,0) = (-1)'^ J]exp(W^)exp(?m0)z'"+^ J]y^^A;_^,)(_,)Aj_^,)^,az™ 

mm' I 

= y ^eyip{im'9)exp{im(f)) Gm'm (12) 

mm' 

The m and m' sums can be computed using FFTs, and are sub-dominant to the scaling, 
using C(L^logL) operations. The FFTs will produce /(6',0) as a regularly sampled 
function on a 2-torus. Only half of this t orus i s of interest and the 6 > n portion can 



be ignored. An appendix to 



BasaketaL 



( I2OO8I ) also noted this algorithm for the inverse 



transform (but does not address the forward transform). 



Computation of 



Gm'm — {-lyi^ '^Z^ V ^ ^[-m'){-s)^{~m')m sdlm (13) 

again scales as 0{Li^). Mirror symmetries of the Wigner matrices again allow the expression 
to be rewritten using only the non-negative quadrant. The mirror symmetry on the first 
azimuthal index of A leads to 

Gm'm = (~1) G'(_m,')m (14) 

which cuts the computation time in half. For a transform to a real-valued function, the 
additional symmetry G'(_m')(-7n) = G*^,^ can again cut the computation time by two. 



2.3. Numerical implementation 

We implement these transforms for double precision complex-valued functions on the 
sphere. A C library is available from the author^. In this implementation, the equiangular 
(Equidistant Cylindrical Projection or ECP) pixelization is described by parameters A^^ 
and Ne, describing the number of pixels in each direction. The first pixel column is centered 
at = 0. The first and last rows each contain redundant pixels at the poles, 6' = and 
6 = TT, respectively. The FFT portions of the computation use the FFTW librarjo and are 
most efficient when the dimensions of the 6'-extended map (appendix [B]) are are powers of 
two. The dimensions of the extended map in our implementation are N^ and 2{Ne — 1). To 
support a function with limited bandwidth / < L requires A^,^, Ne >2L + 1. 

We verified the accuracy of our method in two ways. First, for selected low multipoles, 
we checked our transforms against the analytic expressions for single spin spherical 



^http://www. physics. miami.edu/~huffenbe/research/spinsfast 
^http://www. fftw.org 
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harmonics from iGoldberg et al.l (Il967| ). Second, we showed that our transforms are 
extremely accurate by comparing the spherical harmonic coefficients from a backward- 
forward transform pair to the original coefficients (Fig. [T]), up to a band-limit of L = 4096. 
We plot the rms of the relative difference between the original {sdim) and recovered (sSim) 
harmonic coefficients, \saim — s 0'im\/\sCiim\, for Gaussian random white noise, showing typical 
errors of ~ 10^^'^ which increase roughly proportional to the band limit. Comparison of 
the maximum relative errors shows that these transforms are 3-4 orde rs of magnitude mo re 



accurate than the (already very accurate) s = 0, ±2 transforms from 
onto a similar grid. 



Wiaux et al. 



(120071 ) 



HEALPixo (JGorski et al.ll2005l ). the very useful numerical grid and associated software 
suite for representing and manipulating functions on the sphere, has found widespread use 
in the astronomy and astrophysics community, particularly for CMB work. Under the 
same test, HEALPix transforms, which are approximate, show typical errors of ~ 10~^ (for 
resolution parameter A^side = L/2, a commonly used value). HEALPix provides an option 
to iterate the transform to converge to somewhat higher accuracy, if needed. 

Timing tests were performed on a single 2.83 GHz Xeon processor. Fig. |2] shows 
that the scaling of this implementation is approximately L^. The speed is comparable 
to or slightly faster than the similar arbitrary-spin transform computed by the HEALpix 
suite (version 2.11). The comparison includes the times for the recursive computation of 
special functions — associated Legendre functions for HEALPix and Wigner d-functions for 
our algorithm — and both may be accelerated by precomputing these quantities (at 0{L^) 
memory cost). 

For the HEALPix comparison, again we use A'side = L/2, which means the HEALPix 



^http://healpix.jpl. nasa.gov 
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transforms use 3L^ pixels. In the same comparison our FFT code uses about 4L^ pixels, 
the minimum number to fully support the band-limited function. If both the band-limit 
and the number of pixel were set the same, our algorithm would fare slightly better than 
shown here. 

We also implemented transforms for temperature and polarization Stokes par ameters 



(T, Q, U) which commonly occur in CMB analysis (see 
the differing phase convention for Yim)'- 



Zaldarriaga &: Seljak 



19971 . noting 



T{9,. 
{Q±iU){9,, 



7 ^ CLT,lm Ylmid, (/ 
Im 

/ ^ 0'±2,lm ±2^/m( 



(15) 



Im 



These quantities permit some additional symmetries: T is real-valued and a_2,/m = 
(— l)™a* 2 u_^-). In Fig. Owe compare an implementation of our algorithm to the 
temperature and polarization HEALPix transforms, which are more highly optimized than 
those for a general spin. Again the speed is comparable, this time with HEALPix slightly 
faster. 

In our implementation, roughly half the computation time is spent computing the 
Wigner A-functions. These use recursive algorithms, and may be computed during 
the course of the transform, or precomputed at memory cost of ~ L'^/6 floating point 
numbers. The method for computing the A functions may be selected at runtime. We have 



i mplem ented algorithms for the Wigner matrices from 



Risbol f ll996h and 



Trapani fc Navaza 



( 2006) (represente d inter nally at quadruple precision). In our implementation, the 



Trapani fc Navazal (|2006[ ) recursion is faster and more accurate to our maximum test 
bandlimit L = 4096, but this recursion eventually goes unstable at very high multipoles. If 
they are used at double precision, instead of quadruple, the transforms proceed about 10 
percent faster, but go unstable between L = 2048 and L = 4096. 
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The transform times listed in IWiaux et al.l (120071 ) for L < 1024 are comparable to 
ou rs, although these do no t include the significant prec omputation time. Methods based 



on 



DriscoU k Healyl dlQQJ), such as 



Wiaux et al 



(120071 ). should be much faster at higher 



band-limits because of the improved scaling, but the memory needs become impractical at 



presen t, requiring ~ 77 GB of memory for L = 4096 (scaling as L^ from the 
( 120071 ) memory value for their maximum band-limit, L = 1024). 



Wiaux et al 



Another strength of our approach is that we can re-use the same Wigner matrix to 
compute simultaneously several transforms with differing spins. Because the recursions 
are a significant portion of the computation, this can be much more efficient than running 
the transforms independently. Fig. H] shows the computation time for five spin transforms 
computed together. Our code can reuse the Wigner matrix recursion for each transform, 
in contrast to the HEALPix transforms, which must recompute the required recursions 
for each spin order. For a large number of transforms, such that the Wigner recursion 
is sub-dominant, our transforms proceed about twice as fast as if they were computed 
individually. 



3. Discussion 

Our algorithm is convenient because it implements all spin-s transforms in a single 
routine, and is particularly efficient if transforms at several spin orders have to be computed 
at the same time. Because of the representation of the spherical harmonic coefficients in 
terms of Wigner d- functions at a single argument, no additional computations of special 
functions are necessary if we would like to do several spin-s transform at once. For CMB 
work we have to compute spin and ±2 transforms for polarization, as well as spin 1 and 
spin 3 transforms to simulate lensing. Our generalization therefore presents economies of 
scale. 
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The time and memory requirements for the 0{L^) portion of the calculation depends 
only on the band limit and not the number of pixels. Unlike HEALPix, oversampling 
the function in real space costs very little, only a 2-dimensional FFT, so synthesizing or 
analyzing a map with a modest band limit at high resolution — zero padding the harmonic 
object — is relatively painless, and is typically bounded by the memory available, not the 
computation time. 

This has immediate application to the simulation of CMB lensing. Unlensed CMB maps 
are simulated as isotropic fields by taking the spherical harmonic transform of Gaussian 
random harmonic coefficients. The gravitational potential deflects the path of light rays 
from the CMB's last scattering surface. Hence, a pixel's light will not originate from that, 
or any other, pixel's center, complicating the evaluation of the spherical harmonics. A 
typical strate gy is to high ly oversample the unlensed map, then based on the deflection fleld 



remap pixels ( 



2008 



Lewis 



BasaketaL 



20051) or use an interpolation scheme (JHirata et al 



2004 



Das &: Bode 



20081). The alg orithm described here makes this easier at high resolution. 



In particular, iBasak et al. 



( 120081 ) uses our same method for the backward transform, and 



showed these interpolations can be carried out with v ery accurate. 



Hirata et al. 



y . Als o, th e harmonic 



fl2004h and 



Das fc Bode 



object Cn,m constructed as part of the interpolation in 

( 120081 ) by a Legendre transform followed by an FFT is equivalent our Gmm' (eq- [I3D for spin 

0, which has a more straightforward construction. 

Our method can also be used in conjunction with the ubiquitous HEALPix grid. 
Transforms by our method onto a sufficiently fine equiangular grid can be interpolated onto 
a HEALPix grid at high accuracy and a competitive speed. Using this technique helps to 
combine the advantages of our approach (very high accuracy, fast supersampling and hence 
interpolation, and improved transform performance for multiple spin-s transforms) with 
many elegant features of HEALPix, such as its hierarchical construction, and the equal area 
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and near-regular shape of the pixels. 



Conclusions 



Borrowing the same trick that enables fast convolution on the sphere (jWandelt fc Gorski 



2OOII ) and exploiting the relationship between Wigner d-functions and the spin-s spherical 
harmonics, we have presented a fast and exact (9(L^) algorithm which performs spherical 
harmonic transforms for several spin s functions with band-limit L using a single code. In 
particular, our method is computationally very efficient if several distinct spin transforms 
have to be computed because only one set of special functions are needed. The algorithm's 
scaling with the number of pixels is subdominant; the extra cost for transforming a 
band-limited function from harmonic space to a high-resolution, oversampled map (or vice 
versa) scales modestly like a two-dimensional FFT. This opens possibilities for efficient 
interpolation of spin-transformed fields at high resolution, which can be beneficial, for 
example, when simulating or analyzing gravitational lensing of the CMB. 



Some of the results in this paper have been derived using the HEALPix (1 Gorski et al 



20051 ) package. KMH receives support from NASA-JPL subcontract 1363745. BDW 
acknowledges support through NASA-JPL subcontracts 1236748 and 1371158 and NSF 
grant 07-08849 and thanks Caltech and JPL for hospitality while part of this work was 
being completed. 
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A. Symmetries of Wigner A-function 



A' 

{—m')m 
m' (—m) 

A' 



K, -■-/ '—^rn'm 



\l+m. 



■11 



I A 



(mirror first index) 
(mirror second index) 
(swap indices). 



(Al) 



B. Exact quadrature of integral /„ 



F{e, 



(Bl) 



We extend tlie function on the sphere in the 6 direction via 

/(^,0), e<Ti 

(-l)7(27r-0,0 + 7r), e>'K 
Because F is the same as / for < ^ < vr, F can take the place of / in the integral for I mm' 
without changing its value. This is convenient because F (but not /) can be written as the 
band limited Fourier sum 



F{9,(f))= y^ Ffc„exp(zA;6') exp(m(; 



(B2) 



kn=—L 



Substituting this into the expression for Im'm (Eq. E]), we can rearrange to yield 



^m'm 



7 ^ Fkn 



kn=—L 



2n 



>exp {i{n — in)(f)) 



dO exp {i{k — m')6) sin 6 



(B3) 



The first integration, over 0, is simply 27c6nm.- The second integration, over 6, can be 
evaluated to yield a weight for the sum over the Fourier coefficients. The result is 



Im'm = 27r ^ Fkmw{k - m'), 



(B4) 



k=-L 



where 



wip) 



d6 exp{ip6) sin 6 = < 



0, p even 

2/(1 -p2), podd,p^±l 
^ TiTT/2, p = ±l 



(B5) 



-15- 

Note that flB4|) is written in the form of a discrete convolution in Fourier space. It therefore 
can be evaluated as an multiplication in real space, so that Im'm is the discrete Fourier 
transform of 2TTWrF, where 

7V^/2-l 

Wriq'M) = Y^ exp{ipq'Ae)w{p) (B6) 

is the real- valued quadrature weight (see Fig. \5^ at the sampled 6 values. This gives weight 
to the pixels different than the sin 6 weight appearing in a naive Riemann sum, in particular 
giving weight to the poles which would have none in the naive scheme. The notation 
Nq denotes the number of samples in the function F extended in 6, which depending on 
the pixel implementation, will be about twice the number of samples Ng of /. In our 
implementation A^^ = 2{Ng — 1). 

Therefore we finally have 
Im'm = -^Tj^ Y. Yl exp{-im'q'A9)exp{-imqA^)wr{q'A9)F{q'A9,qA(P) (B7) 



where 



A9 = ^ A<p=— (B8) 



To summarize, Im'm can be evaluated exactly by tabulating the weights (via a 
one-dimensional FFT), multiplying the extended function by them, then computing a 
two-dimensional FFT. 
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spin 13 transforms 



le-13 ■ 




le-14 



256 



512 



This work 



1024 
Band-limit L 



2048 



4096 



Fig. 1 . — Accuracy of a backward- forward transform pair for Gaussian random white noise. 
HEALPix transform pairs (without iteration) yield an accuracy ~ 10^^ in the same test. 
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B 



uu 


spin 13 transforms ^/^ ,,^^ 




y^ ^'^ 


00 


y^^ ::^jf^ 


10 


y^ ,.^^^ 




,,r'"^>^ 


1 


-^[/^ HEALPix— backward 




./' y>^ forward ► 




'' J'^^ This work— backward • 


^ 1 


y^ forward — • — 



256 



512 



1024 
Band-limit L 



2048 



4096 



Fig. 2. — Speed comparisons for spin-13 transforms of random spherical harmonic coeffi- 
cients. HEALPix routines use parameter A^side = -Z^/2. The fast spin transform was run with 
the minimum pixel size necessary to support the bandwidth, as described in the text. 
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Fig. 3. — Similar to figureEj but for the harmonic transforms of temperature and polarization 
Stokes parameters, commonly used in the analysis of CMB data. 
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10 



0.1 



spin -3, -2, 0, 1, 17 forward transforms 




256 



HEALPix 

This work 



512 1024 

Band-limit L 



2048 



Fig. 4. — Computing five different spin transforms (forward transform only). Tlie HEALPix 
transforms are computed successively. Our algoritlim reuses tfie same Wigner rf-function 
recursion for all transforms, leading to a substantial speed-up. 
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0.5 



1 

e/7i 



1.5 



Fig. 5. — Quadrature weights Wr{0) when the number of samples of the extended function F 
is Ng = 14 {Ng = 8), plotted against sin(^) for 6 < tc, the weight function in the continuum 
limit. 
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